Accelerator devices are increasingly used to build large supercomputers and current installations usually include more than one accelerator per system node. To keep all devices busy, kernels have to be executed concurrently which can be achieved via asynchronous kernel launches. Our work compares the performance for an implementation of the Conjugate Gradient method with CUDA, OpenCL, and OpenACC on NVIDIA Pascal GPUs. Furthermore, it takes a look at Intel Xeon Phi coprocessors when programmed with OpenCL and OpenMP. In doing so, it tries to answer the question of whether the higher abstraction level of directive based models is inferior to lower level paradigms in terms of performance.This archive contains the modications to liboffload, al...
Abstract. Recently, OpenCL, a new open programming standard for GPGPU programming, has become availa...
This paper presents a performance comparison between CUDA and OpenACC. The performance analysis focu...
Scientific developers face challenges adapting software to leverage increasingly heterogeneous archi...
Accelerator devices are increasingly used to build large supercomputers and current installations us...
During the past decade, accelerators, such as NVIDIA CUDA GPUs and Intel Xeon Phis, have seen an inc...
The next-generation of supercomputers will feature a diverse mix of accelerator devices. The increas...
In the past decade, accelerators, commonly Graphics Processing Units (GPUs), have played a key role ...
The OpenCL standard allows targeting a large variety of CPU, GPU and accelerator architectures using...
The trend of using co-processors as accelerators to perform certain tasks is rising in the parallel...
Accelerated computing is becoming more diverse as new vendors and architectures come into play. Alth...
This whitepaper investigates the parallel performance of a sample application that implements an app...
The decline of Moore’s law has led to a fundamental shift in the design of micro-processor architect...
The proliferation of accelerators in modern clusters makes efficient coprocessor programming a key r...
Accelerator processors allow energy-efficient computation at high performance, especially for comput...
In recent years, GPU computing has been very popular for scientific applications, especially after t...
Abstract. Recently, OpenCL, a new open programming standard for GPGPU programming, has become availa...
This paper presents a performance comparison between CUDA and OpenACC. The performance analysis focu...
Scientific developers face challenges adapting software to leverage increasingly heterogeneous archi...
Accelerator devices are increasingly used to build large supercomputers and current installations us...
During the past decade, accelerators, such as NVIDIA CUDA GPUs and Intel Xeon Phis, have seen an inc...
The next-generation of supercomputers will feature a diverse mix of accelerator devices. The increas...
In the past decade, accelerators, commonly Graphics Processing Units (GPUs), have played a key role ...
The OpenCL standard allows targeting a large variety of CPU, GPU and accelerator architectures using...
The trend of using co-processors as accelerators to perform certain tasks is rising in the parallel...
Accelerated computing is becoming more diverse as new vendors and architectures come into play. Alth...
This whitepaper investigates the parallel performance of a sample application that implements an app...
The decline of Moore’s law has led to a fundamental shift in the design of micro-processor architect...
The proliferation of accelerators in modern clusters makes efficient coprocessor programming a key r...
Accelerator processors allow energy-efficient computation at high performance, especially for comput...
In recent years, GPU computing has been very popular for scientific applications, especially after t...
Abstract. Recently, OpenCL, a new open programming standard for GPGPU programming, has become availa...
This paper presents a performance comparison between CUDA and OpenACC. The performance analysis focu...
Scientific developers face challenges adapting software to leverage increasingly heterogeneous archi...